MixApart: Decoupled Analytics for Shared Storage Systems
نویسندگان
چکیده
Data analytics and enterprise applications have very different storage functionality requirements. For this reason, enterprise deployments of data analytics are on a separate storage silo. This may generate additional costs and inefficiencies in data management, e.g., whenever data needs to be archived, copied, or migrated across silos. We introduce MixApart, a scalable data processing framework for shared enterprise storage systems. With MixApart, a single consolidated storage back-end manages enterprise data and services all types of workloads, thereby lowering hardware costs and simplifying data management. In addition, MixApart enables the local storage performance required by analytics through an integrated data caching and scheduling solution. Our preliminary evaluation shows that MixApart can be 45% faster than the traditional ingest-then-compute workflow used in enterprise IT analytics, while requiring one third of storage capacity when compared to HDFS.
منابع مشابه
Architecture for Hadoop Distributed File Systems
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economica...
متن کاملA New Non-linear Control of the Four-Leg Inverter with Decoupled Model and Fast Dynamic Response for PV Generation Systems
Distributed generation (DG) will play an important role in future power generation systems, especially in stand-alone applications. Three phase four-leg inverter is a well-known topology which can be used as an interface power converter for DGs. Thanks to the fourth leg to provide the neutral path, the four-leg inverter is able to supply balanced loads as well as unbalanced loads. In this paper...
متن کاملDecoupled Interconnection of Distributed Memory Models
In this paper we present a framework to formally describe and study the interconnection of distributed shared memory systems. In our models we minimize the dependencies between the original systems and the interconnection system (that is, they are decoupled) and consider systems implemented with invalidation and propagation. We first show that only fast (i.e. wait-free) memory models can be int...
متن کاملSQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures
SQL query processing for analytics over Hadoop data has recently gained significant traction. Among many systems providing some SQL support over Hadoop, Hive is the first native Hadoop system that uses an underlying framework such as MapReduce or Tez to process SQL-like statements. Impala, on the other hand, represents the new emerging class of SQL-on-Hadoop systems that exploit a shared-nothin...
متن کاملSocietal Needs, Shared-Value Models, Performance Indicators, Big Data, Business Analytics Models and Tools
In Chapter 1, The CAM framework focused on the development of innovative social business models through the usage of frontier data envelopment analysis to measure shared value for a sustainable growth of an organization. Chapter 2 first discusses the causes and effects of societal challenges and how shared value models can alleviate them. Second, successful technology and non-technology innovat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012